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Abstract 

Conjunctive database queries have been extended with a mechanism 
for object creation to capture important applications such as data ex¬ 
change, data integration, and ontology-based data access. Object creation 
generates new object identifiers in the result, that do not belong to the 
set of constants in the source database. The new object identifiers can be 
also seen as Skolem terms. Hence, object-creating conjunctive queries can 
also be regarded as restricted second-order tuple-generating dependencies 
(SO tgds), considered in the data exchange literature. 

In this paper, we focus on the class of single-function object-creating 
conjunctive queries, or sifo CQs for short. The single function symbol can 
be used only once in the head of the query. We give a new characterization 
for oid-equivalence of sifo CQs that is simpler than the one given by Hull 
and Yoshikawa and places the problem in the complexity class NP. Our 
characterization is based on Cohen’s equivalence notions for conjunctive 
queries with multiplicities. We also solve the logical entailment problem 
for sifo CQs, showing that also this problem belongs to NP. Results 
by Pichler et al. have shown that logical equivalence for more general 
classes of SO tgds is either undecidable or decidable with as yet unknown 
complexity upper bounds. 


1 Introduction 

Conjunctive queries form a natural class of database queries, which can be 
defined by combinations of selection, renaming, natural join, and projection. 
Much of the research on database query processing is focused on conjunctive 
queries; moreover, these queries are amenable to advanced optimizations be¬ 
cause containment of conjunctive queries is decidable (though NP-complete). 
In this paper, we are interested in conjunctive queries extended with a facility 
for object creation. 

Object creation, also called oid generation or value invention, has been re¬ 
peatedly proposed and investigated as a feature of query languages. This has 
happened in several contexts: high expressiveness UGH EE]; object orientation 
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mmmmm-, data integration m; semi-structured data and XML [T| ; and 
data exchange Ha using. In a logic-based approach, object creation is typically 
achieved through the use of Skolem functions [251 (Ml [25]. 

In the present paper, we consider conjunctive queries (CQs) extended with 
object creation through the use of a single Skolem function, which can be used 
only once in the head of the query. We refer to such a query as a ‘sifo CQ’ 
(for single-function object-creating). The following example of a sifo CQ uses a 
Skolem function /: 

Q : Family (c, f(x,y)) ■<— Mother(c,x), Father{c,y). 

The query introduces a new oid f(x,y) for every pair (x,y) of a woman x 
and a man y who have at least one child together; all children c of x and y 
are linked to the new oid in the result of the query (a relation called Family). 
As an example, if Mother(beth, anne) and Father{heth, adam) are two facts 
in the underlying database, then the result of the query includes the fact 
Family (beth , / (anne, adam)) , where f {anne, adam) is the newly created oid. 
This oid will be shared by all the children having anne and adam as parents. 

In this paper, we first revisit the problem of checking oid-equivalence of sifo 
CQs. Oid-equivalence has its origins in the theory of object-creating queries 
introduced by Abiteboul and Kanellakis [3]; it is the natural generalization of 
query equivalence in the presence of object creation. 

Consider for instance the following sifo CQ: 

Q' : Family(c,g(x,y,x)) <— Mother(c,x), Father(c,y). 

It is not hard to see that the result of Q' has the same structure as the result of 
the query Q above. The query Q' links all children c of the parents x and y to 
the oid g(x,y,x) that depends exactly on x and y. That is, two children in the 
result of Q are connected to the same oid if and only if they are connected to 
same oid in Q', although the oids will be syntactically different. Therefore, we 
can conclude that Q and Q' are oid-equivalent, which means that their results 
are identical on any input up to a simple isomorphism mapping the oids in one 
result to those in the other. 

Hull and Yoshikawa [23] studied oid-equivalence (they called it ‘obscured 
equivalence’) for nonrecursive ILOG programs; the decidability of this problem 
is a long-standing open question. Nevertheless, for the case of ‘isolated oid 
creation’, to which sifo CQs belong, they have given a decidable characterization. 

We give a new result relating oid-equivalence to equivalence of classical con¬ 
junctive queries under ‘combined’ bag-set semantics |14j . which models the eval¬ 
uation of CQs when query results and relations may contain duplicates of tuples. 
As a corollary, we obtain that oid-equivalence for sifo CQs belongs to NP, which 
does not follow from the Hull-Yoshikawa test. Obviously, then, oid-equivalence 
for sifo CQs is NP-complete, since equivalence of classical CQs without object 
creation is already NP-complete. 

Object creation is receiving renewed interest in the context of schema map¬ 
pings mm, which are formalisms describing how data structured under a 
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source schema are to be transformed into data structured under a target schema. 
Hence, it is instructive to view sifo CQs as schema mappings, simply by inter¬ 
preting them as implicational statements. As an example, we may view query Q 
above as an implicational statement that relates a query over relations Mother 
and Father in the source schema to the relation Family in the target schema. 

For standard CQs without object creation, two queries are equivalent if and 
only if they are logically equivalent as schema mappings HZj. For sifo CQs, we 
show that oid-equivalence implies logical equivalence, while the converse is not 
true. 

Sifo CQs viewed as schema mappings belong to the class of so-called ‘nested 
dependencies’ [5] , which belong in turn to the class of formulas called second- 
order tuple-generating dependencies (SO-tgds [18]). For instance, consider again 
the sifo CQ Q above: it can be rewritten into the following SO-tgd: 

3f\/x\/y\/c(Mother(c , x) A Father{c , y ) —> Family(c, f{x, y))), 

which is of second order because the function / is existentially quantified. 

Although logical equivalence of SO-tgds is undecidable P2], logical implica¬ 
tion of nested dependencies has recently been shown to be decidable [26]. We 
give a novel and elegant characterization of logical implication for sifo CQs which 
is simpler than the general implication test for nested dependencies. It turns 
out that the problem belongs to NP. Hence, logical implication for sifo CQs 
has no worse complexity than containment for standard CQs without object 
creation. 

Summarizing, in this paper we provide the following contributions in the 
area of query languages with object creation: 

1. We clarify the relationship between sifo CQs and other formalisms in the 
literature, notably, the language ILOG [22], second-order tuple-generating 
dependencies [Ijjj, and nested tuple-generating dependencies [8]. 

2. We relate the problem of oid-equivalence for sifo CQs to the equivalence 
of classical conjunctive queries under combined bag-set semantics, which 
implies its NP-completeness. 

3. We show that when sifo CQs are interpreted as schema mappings, oid- 
equivalence implies logical equivalence but not vice versa. 

4. We provide a new characterization of logical implication for sifo CQs as 
object-creating queries showing that this problem has the same complexity 
as deciding containment for classical CQs. 

This paper is organized as follows. In Section 2 we review some practi¬ 
cal applications of sifo CQs. In Section 3 we formally define object-creating 
conjunctive queries. Section 4 is devoted to the results on oid equivalence. Sec¬ 
tion 5 is devoted to the results on logical entailment. In Section 6 we conclude 
by discussing related work and topics for further research. 
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2 Applications of sifo CQs 

In this section, we discuss further applications of sifo CQs, which may consti¬ 
tute important components of many advanced database systems, spanning from 
information integration and schema mapping engines along with their bench¬ 
marks, to several Semantic Web tools. We believe this shows that the results in 
this article on equivalence and logical implication of sifo CQs are relevant and 
contribute to our understanding of how solutions for these applications can be 
optimized. 

GAV (global-as-view) schema mappings [20] [27, 33 relate a query over the 
source schema, represented by a body B of a CQ, to an atomic element of the 
global schema, represented by a head atom H of a CQ. More precisely, a GAV 
mapping can be written as follows: 

T(x) <- B 

where we use a relation symbol T as the atomic head predicate. 

GAV schema mappings have been used already in the 1990s in mediator 
systems like Tsimmis [30] [33] or Information Manifold [28] for the integration 
of heterogeneous data sources. In both systems, source facts are related to facts 
over the global schema by means of queries. 

Sifo CQs can naturally be seen as extensions of GAV mappings, when one 
of the attributes of the global schema carries newly created identifiers. For 
instance, the sifo CQ Q from Section Q] can express a mapping from a source 
schema containing two relations Mother and Father to one relation Family of 
a global schema, with created identifiers for families appearing in the tuples in 
the result of the mapping. Thus, we can also interpret Q as an extended GAV 
schema mapping. 

Another important application of sifo CQs are schema mapping benchmarks 
allowing the users to compare and evaluate schema mapping systems. In par¬ 
ticular, the flexibility of the arguments of the Skolern functions used for object 
creation has been advocated as one of the desirable features in recent bench¬ 
marks for schema mapping and information integration, such as STBenchmark 
[6] and iBench [9]. 

More precisely, in the mapping primitives of iBench [9], an extension of 
STBenchmark [6] that supports SO-tgds, the users can choose among two dif¬ 
ferent skolemization strategies to fill the arguments of the Skolern functions: 
fixed , where the arguments of the function are pre-defined in a native mapping 
primitive, or variable, where one can further choose among the options All, Key, 
and Random, which generate mappings where all variables, the variables in the 
positions of the primary key, or a random set of variables, respectively, are used 
as arguments of the function. 

These skolemization strategies can be captured by sifo CQs as follows. In 
the query below: 

T(x, y, f(x, y, z, w)) <r- B(x, y, z, w) 

we can observe that the Skolern term uses all the source variables in the body 
B (option All). If the attribute in the position of a; is a primary key for B, then 
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the application of the option Key generates a mapping that can be expressed 
by the sifo CQ 

T(x, y, f(x)) «- B(x, y, z, w). 

Alternatively, choosing the option Random may lead the iBench to randomly 
select the attributes in the positions of x and z, and then to generate the map¬ 
ping represented by 

T(x, y , /( x, z )) «- B(x, y, z , w). 

It is also worth highlighting that three out of the seven mapping primitives 
in iBench that are novel with respect to STBenclimark, namely ADD (copy 
a relation and ADD new attributes), ADL (copy a relation, Add and DeLete 
attributes in tandem) and MA (Merge and Add new attributes) contain single 
Skolern functions. They correspond to the following sifo CQs, respectively: 

T(x,y,f(x,y)) «- B(x,y) 

T{xJ{x)) <- B(x,y) 

T{x, y, z, f(x, y, z)) <- B(x, y), T(y, z). 

A third significant application of sifo CQs is the Semantic Web, where sifo 
CQs can be envisioned in at least two scenarios, namely in systems for ontology- 
based data access (OBDA) and in direct mappings from the relational to the 
RDF data format, under development at W3C0 Indeed, newly created identi¬ 
fiers in the head of a sifo CQ can serve as generated keys, or simply as newly 
invented values needed to fill an attribute of a relation in the global schema. 
As such, sifo CQs can be seen as examples of mapping assertions from source 
schemas to a global ontology in OBDA m- Typically, OBDA mapping asser¬ 
tions relate facts in relational source schemas to RDF triples in a global ontology. 
The newly generated IRD0 in the RDF triples can be interpreted as skolemized 
values in the global ontology. 

A related application is the direct translation of a relational schema into 
OWL, which uses as an important building block the creation of IRIs [32]. In 
contrast to the previous application, this application handles relational schemas 
that are not known in advance. For each relation r in a database schema, 
Datalog-like rules can be used to generate an IRI for the relation r and an IRI 
for each attribute a in r. We take an example of a translation from a relational 
schema into OWL and we show that, actually, these Datalog-like rules can be 
viewed as sifo CQs, since they employ a single concatenation function to obtain 
such IRIs (exemplified as /). The corresponding sifo CQs are reported below: 

Ti(r,f{b,r)) <- B^r) 

Ti{a, r, /(&, r, a)) «- B 2 {r, a), 

1 http://www.w3.org/TR/rdb-direct-mapping/ 

2 IRIs stand for Internationalized Resource Identifiers and extend the syntax of URIs (Uni¬ 
form Resource Identifiers) to a much wider repertoire of characters. They naturally embody 
global identifiers that refer to the same resource on the Web and can be used across different 
mapping assertions to refer to that resource. 
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where B i and B 2 are conjunctive query bodies retrieving relation names r 
and attribute names a from the data dictionary of an underlying relational 
database, and where b is a string representing a given IRI base (e.g., the string 
http://example.edu/db’) for the same database to be translated. Thus, the first 
query creates a new IRI for the relation r, by concatenating b with the relation 
symbol r, while the second query returns the set of IRIs of the attributes a of 
r, by concatenating b with the relation symbol r and its attribute symbols a. 


3 Preliminaries 

In this section we introduce our formalism for dealing with conjunctive queries 
and introduce the notion of object-creating conjunctive query, adapted from the 
language ILOG 122) . 

3.1 Databases and conjunctive queries 

From the outset we assume a supply of relation names , where each relation 
name R has an associated arity ar {R). We also assume an infinite domain dom 
of atomic data elements called constants. A fact is of the form R{a \,..., a*,) 
where ai, ..., a*, are constants and R is a k-eccy relation name. We call R the 
predicate of the fact. 

A database schema S is a finite set of relation names. An instance of S is a 
finite set of facts with predicates from S. The set of all constants appearing in 
an instance I is called the active domain of I and denoted by adom(/). 

We further assume an infinite supply of variables, disjoint from dom. An 
atom is of the form R(x 1 ,..., Xk ) where x\, ..., Xk are variables and I? is a k- ary 
relation name. As with facts, we call R the predicate of the atom. 

We can now recall the classical notion of conjunctive query (CQ) |2j i l3] . 
Syntactically, a CQ over a database schema S is of the form 

H B, 

where B is a finite set of atoms with predicates from S, and H is an atom with 
a predicate not in S. The set B is called the body and H is called the head. It is 
required that every variable occurring in the head also occurs in the body. We 
denote the set of variables occurring in a set of atoms B (or a single atom A) 
by var(R) (or var(A)). 

The semantics of CQs is defined in terms of valuations. A valuation is a 
mapping a : X — > dom on some finite set of variables X. When A is an atom 
with var(A) C A', we can apply a to A simply by applying a to every variable 
in A. This results in a fact and is denoted by a(A). When B is a set of atoms 
and a is a valuation on var(R), we can apply a to B by applying a to every 
atom in B. Formally, a(B) is defined as the instance {a(A) \ A € B}. 

When I is an instance and a is a valuation on var(R) such that a(B) C I, 
we say that a is a matching of B in I, and denote this by a : B —> I. Now when 
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Q is a CQ H ■<— B and / is an instance, the result of Q on I is defined as 

Q{I) := {a(H) \ a : B I}. 

3.2 Object-creating conjunctive queries 

Assume a finite vocabulary of function symbols of various arities. As with 
relation names, the arity of a function symbol / is denoted by ar(/). 

Data terms are syntactical expressions built up from constants using function 
symbols. Formally, data terms are inductively defined as follows: 

1. Every constant is a data term; 

2. If / is a k -ary function symbol and d ±,... ,<4 are data terms, then the 
expression f(d ±,..., d k ) is also a data termo 

An extended fact is defined just like a fact, except that it may contain data 
terms rather than only constants. Formally, an extended fact is of the form 
R(d \,..., dk ), where d \,..., <4 are data terms and I? is a k -ary relation name. 
The active domain of an extended fact e = R(di,... ,dk) is defined as 

adom(e) := {di,... ,d k }- 

An extended instance is a finite set of extended facts. The active domain of an 
extended instance J is defined as 

adom(J) := |^J adom(e). 
eeJ 

Formula terms are defined in the same way as data terms, but are built up 
from variables rather than constants. Extended atoms are defined like atoms, 
but can contain formula terms in addition to variables. If t is a formula term 
and a is a valuation defined on all variables occurring in t , we can apply a to 
every variable occurrence in t, obtaining a data term a(t). Likewise, we can 
apply a valuation to an extended atom, resulting in an extended fact. 

We are now ready to define the syntax and semantics of object-creating con¬ 
junctive queries (oCQ). Like a classical CQ, an oCQ is of the form H <— B. 
The only difference with a classical CQ is that H can be an extended atom; 
in particular, B is still a finite set of “flat” atoms, not extended atoms. It is 
still required that var (H) C var (B). The result of an oCQ Q = H B on an 
instance I is now an extended instance, defined as 

Q(I) ~ {a{H) | a.B^I}. 


3 Since constants are atomic data elements, no constant is allowed to be of the form 
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Mother 
beth anne 

ben anne 

eric claire 

emma diana 

dave diana 

Table 1: Instances used in Example 13.11 


Father Family 


beth 

adam 

ben 

adam 

eric 

carl 

emma 

carl 


beth 

/(anne, adam) 

ben 

/(anne, adam) 

eric 

/(claire, carl) 

emma 

/(diana, carl) 


R 

a 

b 

c 

a 

b 

d 

c 

b 

d 

d 

c 

a 


T 

a f(b) 
c f(b) 
d fjc) 


Table 2: Instances used in Example 13.21 


Example 3.1. Recall the oCQ Q from the Introduction: 

Family (c, f(x, y)) <— Mother (c, x), Father (c, y ). 

If / is the instance consisting of the Mother and Father facts listed in Table [Q 
then Q(I) is the extended instance consisting of the extended Family facts listed 
in the same table. 

Example 3.2. For a more abstract example, consider the following oCQ Q: 

T {x,f{y)) «- R(x,y,z). 

If I is the instance consisting of the R- facts listed in Tabled then Q(I) consists 
of the extended T-facts listed in the same table. 

3.3 The single-function case 

In this paper, we focus on single-function oCQs (sifo CQs), that have exactly 
one occurrence of a function symbol in the head. Without loss of generality we 
always place the function term in the last position of the head. 

Definition 3.3. A sifo CQ over a database schema S is an oCQ over S of the 
form 

where 

• T is the head predicate; 

• / is a function symbol; 

• B is the body; 
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Family 

beth y(anne, adam, anne) 

ben y(anne, adam, anne) 

eric y(claire, carl, claire) 

ernma y(diane, carl, diane) 

Table 3: Instance used in Example 14.11 


• x is a tuple of (not necessarily distinct) variables from var (B), called the 
distinguished variables ; 

• z is a tuple of (not necessarily distinct) variables from var (B), called the 
creation variables' some creation variables may be distinguished; 

• The elements of var (B) that are not distinguished are called the non- 
distinguished variables. 

Example 3.4. The queries in Examples 13.II and 13.21 are both examples of sifo 
CQs. 

3.4 Comparison with ILOG 

Object-creating CQs can be considered to be the conjunctive-query fragment 
of nonrecursive ILOG [22] : our syntax exposes the Skolem functions, which are 
normally obscured in the standard ILOG syntax, and our semantics corresponds 
to what is called the ‘exposed semantics’ by Hull and Yoshikawa. Nevertheless, 
in the following section, we will consider oid-equivalence of sifo CQs, which does 
correspond to what has been called ‘obscured equivalence’ [ 23] . 

4 Characterization of oid-equivalence for sifo CQs 

4.1 Oid-equivalence of oCQs 

The result Q(I ) of an oCQ Q applied to an instance I is an extended instance. 
The data terms in adorn (Q(I)) that are not constants play the role of created 
oids (also called invented values). Intuitively it is clear that the actual form of 
the created oids does not matter. 

Example 4.1. Recall the query Q from Example 13.II 

Family (c, f(x , y)) <— Mother(c, x), Father (c, y ). 

As mentioned in the Introduction, we could have used equivalently the following 
query Q ': 

Family{c , g(x, y, x)) <— Mother(c , x), Father{c , y). 

Applying the above query to the Mother and Father facts from Table |T| results 
in the instance shown in Table [3] Intuitively, this instance has exactly the same 
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I Q(I) Q'(i) 


a b c 


a f(b) 


a f (a, b) 

d b e 


d m 


d f(d,b) 


Table 4: Instances used in Example 14.71 


relevant properties as the Family-instance from TableUj beth and ben are linked 
to the same family-oid; eric is linked to another oid; and emma to still another 
one. □ 

We formalize this intuition in the following definitions. 

Definition 4.2. Let J be an extended instance. 

• The set adom(J) — dom is denoted by oids(J); 

• The set adom(J) n dom is denoted by consts( J). 

Definition 4.3. Let J be an extended instance and let p be a mapping from 
adom(J) to the set of data terms. For any extended fact e = R{d\ 1 ... ,dk) in 
J, we define p{e) to be the extended fact R(p(d \),... ,p(dk))- We then define 
P{J) ’•= (P(e) I e G J}. 

Definition 4.4. Let J\ and J 2 be extended instances. Then J\ and J 2 are 
called oid-isomorphic if there exists a bijection p : adom( J\) — > adom(J 2 ) such 
that 

• p is the identity on consts(Ji); 

• p maps oids(Ji) to oids( J 2 ); 

• p(Ji) = J 2 . 

Such a bijection p is called an oid-isomorphism from J\ to Ji- 

The above definition implies that oid-isomorphic instances have the same 
constants. Formally, if J\ and J 2 are oid-isomorphic then consts( Ji) = consts( J 2 ) 

Definition 4.5. Let Q and Q' be two oCQs with the same head predicate, and 
over the same database schema S. Then Q and Q' are called oid-equivalent if 
for every instance I over S, the results Q(I) and Q'(J ) are oid-isomorphic. 

Example 4.6. The queries in Example 14 . 1 1 are oid-equivalent. For example, for 
the instance / of TableQ] the oid-isomorphism from Q(I) to Q'(I ) is as follows: 

/(anne, adam) 1 —> g(anne, adam, anne) 

/(claire, carl) >->■ g(claire, carl, claire) 

/(diane, carl) 1 —> g(diane, carl, diane). 
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/ 

Q(i) 

Q'{i) 

a b c 

a /(a) 

a f(a,b,c) 

a d e 


a f(a, d, e) 


Table 5: Instances used in Example 14.81 


Example 4.7. Recall the query Q from Example 13.21 

T(x,f{y )) «- R(x,y,z) 

Also consider the following variation Q' of Q: 

T(x, f(x, y)) <- R(x,y,z) 

Then Q and Q' are not oid-equivalent, as shown by the simple instances in 
Table|U Indeed, there cannot be an oid-isomorphism from Q(I) to Q'(I) because 
Q(I) contains only one distinct oid while Q'{I) contains two distinct oids. 

Example 4.8. As a variant of Example 14.71 consider the following two oCQs: 

Q = T(x, /(&)) <- R(x, y, z) 

Q' = T(x, f{x, y, z)) «- R{x, y, z) 

Again these two oCQs are not oid-equivalent, as shown by the counterexample 
instances in Table [5] 

4.2 Homomorphisms and containment of conjunctive queries 

The characterizations we will give for oid-equivalence of sifo CQs depend on 
the classical notions of homomorphism and containment between conjunctive 
queries. Let us briefly recall these notions now mm- 

A variable mapping is a mapping h from a finite set X of variables to another 
finite set Y of variables. If A is an atom with variables in X , then we can apply 
h to each variable occurrence in A to obtain an atom with variables in Y. which 
we denote by h(A). If B is a set of atoms with var(R) C X , then we naturally 
define h(B) := |/i(A) | A € B}. 

For two sets B and B' of atoms, a variable mapping h : var (B) —> var (B 1 ) 
is called a homomorphism from B to B' if h(B) C B'. This is denoted by 
h : B —> B'. The notion of homomorphism is extended to conjunctive queries 
Q = H B and Q' = H' t— B' as follows. A homomorphism from Q to Q' 
is a homomorphism h : B —»• B' such that h(H) = H'. This is denoted by 

h: Q Q'. 

A classical result relates homomorphisms between conjunctive queries to 
containment. Let Q and Q' be two conjunctive queries over a common database 
schema S. We say that Q' is contained in Q if for every instance / of S, we 
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have Q'(I ) C Q(I). The classical result states that Q' is contained in Q if and 
only if there exists a homomorphism h : Q —> Q'. 

Two queries Q and Q’ are equivalent if for every instance / of S, we have 
Q(I) = Q'(I )■ Since equivalence amounts to containment in both directions, 
two conjunctive queries are equivalent if and only if there exist homomorphisms 
between them in both directions. 

4.3 A normal form for old-equivalence problems 

In this subsection we consider two arbitrary sifo CQs Q, Q' with the same head 
predicate: 


Q = T(x,f(z)) B 
Q' = T(x', f'(z')) i— B'. 

Then x and x! have equal length. Note that x and z as well as x' and z' may 
have variables in common. 

Our aim is to show that oid-equivalence between arbitrary sifo CQs Q and 
Q’ can be reduced to the case where the heads 

T(x,f(z)) and T(x',f'(z')) 

have identical arguments, that is, where x = x! and z = z!. 

As a first lemma we state that rearranging the creation variables of a query 
does not affect oid-equivalence. 

Lemma 4.9 (Rearranging creation variables). Let Q be a sifo CQ written as 
above. Let u be a tuple with exactly the same variables as z, but possibly with 
different repetitions and a different ordering, and let g be a function symbol 
whose arity is equal to the length of u. Then the sifo CQ P = T(x,g(u)) C— B 
is oid-equivalent to Q. 

Proof. Let I be an instance. We define an oid isomorphism from Q{I) to P(I) as 
follows. Any oid o in Q(I) is of the form f(a(z)) for some matching a: B J; 
we define p(o) := g(a(u)). This is well-defined, i.e., independent of the choice 
of a. Indeed, if the data terms f(ai(z)) and f(a 2 {z)) are equal, then the tuples 
cci (z) and 0 : 2 ( 2 ) are equal, which implies that oi and 02 agree on every variable 
appearing in z. Since exactly the same variables appear in u, also the tuples 
oi(ti) and 02 (u) are equal, whence g(ai(u)) = g(a 2 (u)). 

That p : oids(Q(/)) —> oids(P(J)) is injective is shown by an analogous 
argument. The surjectivity of p, as well as the equality p(Q(I)) = P(I), are 
clear. □ 

By the above lemma, we can remove all duplicates from z and z' in the heads 
of Q and Q’ , respectively. So, from now on we may assume z and z! have no 
duplicates. 

In the following, let Z equal the set of variables occurring in z, let X equal 
the set of variables occurring in x, and let Z’ and X' be defined similarly. 
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We next show that two sifo CQs can only be oid-equivalent if they have 
identical patterns of distinguished variables, up to renaming. 

Lemma 4.10 (Renaming distinguished variables). IfQ andQ' are oid-equivalent, 
then there exists a bijective variable mapping a : X —» X' such that <j(x) = x'. 

Proof. Certainly, if Q and Q' are oid-equivalent, then the conjunctive queries 
Q o = To(x) 4— B and Q' 0 = Tq(x') 4 — B ', where T 0 is a new predicate symbol, 
are equivalent. So, there are homomorphisms h: Qo —> Q' 0 and h': Q' 0 —s- Qo. 
In particular, h{x) = x! and h’(x') = x. We define a to be the restriction of h to 
X. The claim cr(x) = x' and the surjectivity of cr are then clear. So it remains 
to show that a is injective. Thereto, consider h'{<j(x)) = h'(h(x)) = h'(x') = x. 
We see that h! o a is the identity on X and thus injective. Hence, a must be 
injective as well. □ 

By the above lemma, if there does not exist a renaming a as in the lemma, 
certainly Q and Q' are not oid-equivalent. If there exists such a renaming, then 
by renaming the variables in one of the two queries, we can now assume without 
loss of generality that x = x! and in particular that X = X'. 

The next step is to show that oid-equivalent queries must have the same 
distinguished variables among the creation variables, that is, X fl Z = X D Z’. 

Lemma 4.11 (Distinguished creation variables). If X fl Z ^ X fl Z', then Q 
and Q' are not oid-equivalent. 

Proof. Either there exists some x £ X D Z but not in Z' or vice versa. By 
symmetry we may assume the first possibility. 

We construct an instance I from B'. In doing this, to keep our notation 
simple, we consider the variables in B' to be constants. The instance / is ob¬ 
tained from B' by duplicating x to some new element x- 2 - Formally, consider the 
mapping d on vax(B') that is the identity everywhere except that x is mapped 
to X 2 ; then I = B' U d(B'). 

First, let us look at Q'(I). Using the identity matching that maps every 
variable to itself, we obtain the extended fact T(x, f(z')) £ Q'(I ). Using the 
matching d defined above, we obtain the extended fact T(x 2 , f(d(z'))) in Q'(I). 
Here, X 2 denotes d(x), i.e., X 2 is obtained from x by replacing x with X 2 - Since 
x does not belong to Z', we have d(z') = z', so T(x 2 , f'{z')) £ Q'(I ). 

On the other hand, in Q(I) consider any two extended facts T(ai(x), f(ai(z))) 
and T(a 2 (x), f(ci 2 (z))), with matchings op : B —>• I and < 22 : B —> I, such that 
a\(x) = x and a. 2 {x) = X 2 ■ Then in particular a\(x) = x and ct 2 (x) = X 2 ■ Since 
ai and «2 differ on x, and x is in Z , also a\ (z) and a. 2 (z) are different. Hence, 
the two last components f(ai(z )) and /(a 2 (^)) are different. Thus, we see that 
in Q(I ) it is impossible to have two extended atoms T(x,o) and T(x 2 ,o) with 
the same oid o. But we have seen this is possible in Q'(I), so Q{I) and Q'{I) 
are not oid-isomorphic and Q and Q' cannot be oid-equivalent. □ 

By the above Lemma we now assume X fl Z = X fl Z'. The last step is to 
show that Z — X and Z' — X , the sets of non-distinguished creation variables, 
need to have the same cardinality. 
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Lemma 4.12 (Non-distinguished creation variables). IfZ — X andZ' — X have 
different cardinality then Q and Q' are not oid- equivalent. 

Proof. As in the proof of Lemma 14.111 we consider B as an instance, viewing 
variables as constants. 

Let k and k' be the cardinalities of Z — X and Z — X', respectively. By 
symmetry we may assume that k > k’. Now, for any natural number n, let 
I n be the instance obtained from B by independently multiplying each variable 
z £ Z — X into n fresh copies 20 1 ),..., z^ n \ Formally, for any function d : 
Z — X —>• {1,..., n}, let d be the valuation on var (B) that maps each z € Z — X 
to z(Az)) an d that is the identity on all other variables. Then 

In = U <*(B). 

There are n k different functions d : Z — X —> {1,... ,n}. Each corresponding 
valuation d is a matching of B in I n ; all these matchings are the identity on x 
but are pairwise different on z. Thus there are at least n k different extended 
facts in Q{I n ) of the form T(x, o). 

On the other hand, consider any set S of valuations from XUZ' to adorn (I n ) 
that are pairwise different on Z' — X but that all agree on X. The cardinality 
of Z' — X is k'. The cardinality of adom(J„) is 0{n) (although the cardinality 
of I n itself is larger). Hence, such a set S can be of cardinality at most 0{n k ). 
Consequently, since k > k' 7 for n large enough, Q'(I n ) cannot possibly contain 
n k different extended facts of the form T(x, o). But we saw that this is possible 
in Q(I n ). So, Q(I n ) and Q'(I n ) are not oid-isomorphic and Q and Q' cannot be 
oid-equi valent. □ 

By the above lemma, and after renaming the variables in Z' — X and reorder¬ 
ing the variables in z', we may now indeed assume that z and z' are identical. 

4.4 Characterization of oid-equivalence 

According to the results of the preceding subsection, we are now given two sifo 
CQs as follows: 


Q = T(x,f(z)) B (1) 

Q' = T(x, f'(z)) <- B'. (2) 

Note that Q and Q' have identical tuples x and z of distinguished and creation 
variables; moreover, z contains no variable more than once. As before, we denote 
the sets of distinguished and creation variables as X and Z, respectively. 

We will show that Q and Q' are oid-equivalent if and only if there are 
homomorphisms between B and B' in both directions that (i) keep x fixed and 
(ii) possibly permute the variables in z. To make this formal, we associate to 
each query a classical CQ without function symbols. 
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Definition 4.13. Fix a new relation symbol T of arity the sum of the lengths 
of x and z. The flattening of Q is the query Q = T(x,z) ■$— B. The query Q' is 
defined similarly. 

Let n be a permutation of the set Z — X. We extend n to var(13) by defining 
it to be the identity outside Z — X. We now define Q* to be the conjunctive 
query obtained from Q by permuting the variables in z, that is 

Q” = 

This notion allows us to formulate the following natural sufficient condition 
for oid-equivalence. 

Proposition 4.14. If there exists a permutation 7 r of Z — X such that Qfl and 
Q ' are equivalent, then Q and Q 1 are oid-equivalent. 

Proof. Let / be an instance. We define an oid isomorphism p from Q{I) to 
Q'(J) as follows. Any oid o in Q(I) is of the form f(a(z)) for some matching 
a: B —> I; we define p{o) := f (a(Tt(z))). This is well-defined, i.e., independent 
of the choice of a. Indeed, if the data terms f(ai(z)) and /(a 2 (5)) are equal, 
then the tuples ai(z) and a 2 (5) are equal, and consequently the permuted tuples 
ai(7r(z)) and a 2 (7 t(z)) are equal. Hence, f (ai(n(z))) = /'(a 2 (7r(z))). 

The injectivity of p : oids(Q(/)) —>• oids (Q'(I)) is shown by an analogous 
argument. The surjectivity of p , and the equality p(Q(I )) = Q'{I), follow 
readily from the equality Q n (I) = Q'(I). □ 

We next prove that the sufficient condition given by the above Proposition 
is actually also necessary for oid-equivalence. The key idea for proving this is 
to show that oid-equivalence of sifo CQs depends only on the number of oids 
generated for any binding of the distinguished variables. 

Formally, for any instance I and any tuple c of elements from adom(J), we 
define 

#e(Q, I) := #{o\T(c,o)eQ(I)}, 

that is, #e(Q, I) denotes the number of distinct oids o that occur together 
with c in Q{I). We will show that Q and Q' are oid-equivalent if and only if 
#e(Q, I) = #e(Q',I) for all instances I and tuples c. The only-if direction of 
this statement is obvious, but the if-direction is not so obvious. 

For our proof, we rely on work by Cohen |14] who studied queries with multi¬ 
set variables that are evaluated under so-called combined semantics, a semantics 
that combines set and multiset semantics. Cohen characterized equivalence of 
such queries in terms of homomorphisms. 

Queries with multiset variables (MV queries) have the form Qq, M where Q 0 
is a standard CQ and M is some set of variables of Qo that do not appear in the 
head of Qo- The elements of M are called the multiset variables. Evaluating 
an MV query Qq,M on an instance I results in a multiset (bag) of facts, where 
the number of times a fact occurs is related to the number of different possible 
assignments of values to the multiset variables. 
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Let us define the combined semantics formally. Let Q o be of the form Hq <— 
B 0 and let I be an input instance. Recall that Qo{I) according to the classical 
semantics equals 

{a(H 0 ) \a:B 0 ^I}. 

Let W be the set of variables appearing in Hq. Then the result of evaluating 
the MV query Qo,M on instance I is defined to be the multiset with ground 
set Q 0 (I), where for each fact e £ Qo(I), the multiplicity of e in the multiset is 
defined to be 

#{tI m 7 : B 0 ->• / and "f(H 0 ) = e}. 

That is, given a fact a(Ho) £ QoQ ), there may be many different matchings 
7 that agree with a on Ho- The multiplicity of a(Ho) is defined to be not the 
total number of different such matchings 7 , but rather the number of different 
restrictions one obtains when restricting these matchings 7 to 

Two MV queries are equivalent if they evaluate to the same multiset on 
every input instance. Equivalence of MV queries can be characterized using the 
notion of multiset-homomorphism M- A multiset-homomorphism, from MV 
query Qo,M to MV query Q' 0 ,M' is a homomorphism h : Qo ► Q ' 0 such that 
h is injective on M and h(M) C M'. Cohen showed the following: 

Theorem 4.15 ( 114) . Thm 5.3). Two MV queries are equivalent if and only if 
there are multiset homomorphisms between them in both directions. 

To leverage this result on MV equivalence, we associate two MV queries to 
our given sifo CQs in the following way. 

Definition 4.16. Fix a new relation symbol To of arity the length of x. The 
MV queries Q and Q' are defined as Qo, (Z ~ A') and Q' 0 , (Z — X) respectively, 
where 


Qo — To{x) ■£- B 
Qo = T 0 (x) <- B' 

The following proposition now relates oid-equivalence to MV-equivalence: 

Proposition 4.17. If Q and Q 1 are oid-equivalent, then the MV queries Q and 
Qf are equivalent. 

Proof. Let I be an instance. We must show that the multisets Q(I) and Q'(I) 
are equal. Since Q and Q' are oid-equivalent, the ground sets Qo(I) and Q' 0 (I) of 
Q{I) and Q'(I) are already equal. We must show that the element multiplicities 
are the same as well. 

4 The motivation for MV queries was to model the semantics of positive SQL queries with 
nested EXISTS subqueries. While queries under standard SQL semantics return multisets of 
tuples, only the relations mentioned in the top level SQL block contribute to the multiplicities 
of answers, whereas relations mentioned in the subquery do not. 
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Let Xo(c) be an arbitrary element of Qo(I). By the semantics of oCQs, we 
have the following equalities: 

#s(Q, I) = #{l\xuz | 7 : B ->• / and j(x) = c} 

#s(Q'i -0 = #{l\xuz | 7 : B' -> I and j(x) = c} 

Since Q(I) and Q'{I) are oid-isomorphic, the left-hand sides of the above two 
equalities are equal. Hence, the right-hand sides are equal as well. But these 
are precisely the multiplicities of Tq(c) in Q(I) and Q'{I) respectively. □ 

The following proposition further relates MV equivalence to equivalence of 
the flattenings up to permutation: 

Proposition 4.18. If the MV queries Q and Q' are equivalent then there exists 
a permutation 7r of Z — X such that Q n and Q' are equivalent. 

Proof. By Theorem l4.151 there exist a multiset homomorphism h from Q to Q\ 
and a multiset homomorphism hi from Q' to Q. Since Theorem l4.15l also implies 
that h is injective on Z — X and that h(Z — X) C Z — X, we can conclude that 
h acts as a permutation on Z — X. Moreover, h is the identity on X. The same 
two properties hold for h'. 

Now put 7r = (hlz-x)- 1 ■ Then h : Q n —>• Q'. So it remains to find a 
homomorphism h" : Q' —> Q n . Thereto, note that h'h acts as a permutation on 
Z — X. Since Z — X is finite, there exists a nonzero natural number m such that 
(h'h) m is the identity on Z — X. Equivalently, {h'h) m ~ 1 h' equals it on Z — X. 
We conclude that {h'h) rn ~ 1 h' is the desired homomorphism h". □ 

We summarize the three preceding Propositions in the following. 

Theorem 4.19. Consider two sifo CQs 

Q = T(x,f(z)) 4— B 
Q' = T(x, f(z)) <— B' 

where Q and Q' have identical tuples x and z of distinguished and creation 
variables, and where z contains no variable more than once. Denote the sets of 
distinguished and creation variables by X and Z, respectively. 

The following are equivalent: 

1. The sifo CQs Q and Q' are oid-equivalent; 

2. The MV queries Q and Q' are equivalent; 

3. There is a permutation n of Z — X such that the classical CQs QC and Q' 
are equivalent. 
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4.5 Computational complexity 

The results of this section imply the following: 

Corollary 4.20. Testing oid-equivalence of sifo CQs is NP-complete. 

Proof. Assume given sifo CQs Q and Q' with the same head predicate: 

Q = T(x, f(z)) t— B 
Q' = T(x', f'(z')) <— B’. 

Let X , X ', Z and Z' denote the sets of variables occurring in x, x' , z and z', 
respectively. 

To test oid-equivalence, we begin by removing duplicates in z and z'. as 
justified by Lemma [4.91 Note that x and x! have the same length k, because 
of the fixed arity of T. So we can write x = x±,... ,Xk and x' = x[,... ,x' k . 
Consider the mapping a = {(aq,^),..., (xk,x' k )}. We test if a is a bijection 
from X to X'\ if not, then Q and Q' are not oid-equivalent by Lemma 14.101 
If a is a bijection, we can safely replace every variable x' in X' by cr~^{x'), 
which yields a sifo CQ that is oid-equivalent to Q'. Hence, from now on we may 
assume that x = x! and in particular X = X'. 

Next, we test whether X fl Z = X fl Z' and whether Z — X and Z' — X have 
the same cardinality; if one of the two tests fails then Q and Q' are not oid- 
equivalent by Lemmas 14.111 and 14.121 Otherwise, we can rename the variables 
in Z' — X, so that we may assume that z = z'. 

We are now left in the situation where Q and Q' are in the general forms 
m and © from Subsection 14.41 to which Theorem 14.191 applies. By the third 
statement of this theorem we can test oid-equivalence of Q and Q' in NP by 
guessing a permutation 7r and two homomorphisms between Q n and Q' in both 
directions. 

NP-hardness follows immediately because the problem has equivalence of 
classical CQs as a special case, which is well known to be NP-hard. Indeed, oid 
equivalence of sifo CQs Q and Q' in the special case where the creation functions 
are nullary, amounts to classical equivalence when we ignore the function terms 
in the heads. □ 

5 Logical entailment of sifo CQs interpreted as 
schema mappings 

Object-creating CQs, and sifo CQs in particular, can also be interpreted alter¬ 
natively as schema mappings rather than as queries. Specifically, consider a sifo 
CQ Q of the general form T(x, f{z)) <— B over the database schema S. Let v be 
the sequence of all variables used in B. Then we may view Q as a second-order 
implicational statement over the augmented schema S U {T}, as follows: 

3 f\/v(B -> H) 
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Family Family 


beth 

jones 

ben 

jones 

eric 

jones 

emma 

jones 


beth 

jones 

ben 

jones 

eric 

simpson 

emma 

smith 


Table 6 : Instances J\ and J 2 from Example 15. II 


Family 


beth 

jones 

ben 

murphy 

eric 

simpson 

emma 

smith 


Table 7: Instance J 3 from Example 15.11 


Here, H is the head and B is conveniently used to stand for the conjunction 
of its elements. Note that this formula is second-order because it existentially 
quantifies a function /; we denote the above formula by sotgd(<3). This formula 
belongs to the well-known class of second-order tuple-generating dependencies 
(SO-tgds). More specifically, it is a plain SO-tgd [f]. 

Syntactically, the plain SO-tgds coming from sifo CQs in this manner form 
a restricted class of SO-tgds, defined by the following restrictions: 

• Plain SO-tgd may consist of multiple rules; sifo CQs consist of a single 
rule. 

• The head of a plain SO-tgd may consist of multiple atoms; the head of a 
sifo CQ consists of a single atom. (This is similar to GAV mappings [27] 
m, although the classical notion of GAV mapping does not use function 
symbols.) 

• There is only one function symbol, which moreover can be applied only 
once in the head. 

When interpreting a sifo CQ Q as an SO-tgd, the semantics becomes that of a 
schema mapping. Specifically, let I be an instance over S, considered as a source 
instance, and let J be an instance over {T}, considered as a target instance. 
Then (/, J) together form an instance over the augmented schema S U {T}. 
Now we say that (/, J) satisfies Q, denoted by (/, J) |= Q, if the structure 
(adom(7) U adom( J), /, J) satisfies sotgd(Q) under the standard semantics of 
second-order logic, using adom(/) U adom( J) as the universe of the structure. 

The following example and remark illustrate that the semantics of sifo CQs 
as SO-tgds is quite different from their semantics as object-creating queries. 

Example 5.1. Let us consider again our query from Example 1. As we have 
mentioned in the Introduction, we can now write it as an SO-tgd as follows: 

3 f\/x\/yVc(Mother(c, x ) A Father(c , y ) —>• Family (c, f(x , y ))) 


19 





Take the instance I consisting of the Mother and Father facts listed in Ta¬ 
ble 13.11 and take the instances J\ and Ji consisting of the Family facts listed in 
Table [S] left and right respectively. Then both pairs (/, Ji) and (/, J 2 ) satisfy 
the SO-tgd. For J\ this is witnessed by the following function /: 

x y f{x,y) 
anne adarn jones 
claire carl simpson 
diana carl smith 


For J 2 this is witnessed by the function that simply maps everything to jones. 

In contrast, for J 3 consisting of the Family facts listed in Table [3 the pair 
(I, J 3 ) does not satisfy the SO-tgd. Indeed, suppose there would exist a function 
/ witnessing the truth of the formula on (/, J 3 ). Since beth has anne as mother 
and adam as father, the fact 

Family(beth, /(anne, adam)) 

must belong to J 3 . The only Family-fact with beth in the first position is 

Family (beth, jones), 

so we conclude 

/(anne, adam) = jones. 

Furthermore, since ben also has anne as mother and adam as father, the fact 

Family(ben, /(anne, adam)) 

must be in J 3 . The only Family-fact with ben in the first position is 

Family (ben, murphy), 
however, so we must conclude that 


/(anne, adam) = murphy, 

which is in contradiction with the previous conclusion. 

Remark 5.2. Note that, by the purely implicational nature of SO-tgds, if (/, J) 
satisfies an SO-tgd and J C J', then also (/, J') satisfies the SO-tgd. Hence, 
continuing the previous example, for any instance J' obtained by Ji or J 2 by 
adding some more Family-facts, the pair (J, J') would still satisfy the SO-tgd 
from the example. □ 

The above example and remark show that given a source instance /, there 
are in general multiple possible target instances J such that (J, J) (= Q. This is 
in contrast to the semantics of Q as an oCQ, where Q(I) is an extended instance 
that is uniquely defined. Still, there is a connection between the oCQ semantics 
and the SO-tgd semantics. Specifically, Q(I) can be viewed as a target instance 
in a canonical manner, using oid-to-constant assignments (oc-assignments for 
short) defined as follows. 
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Definition 5.3. Let I be a source instance and let J be an extended instance 
over {T} such that consts(J) C adom(J). An oc-assignment for J with respect 
to I is an injective mapping p : oids( J) —> dom so that the image of p is disjoint 
from adorn(/). 

Thus, p assigns to each non-constant data term from J a different constant 
that is not in adorn(/). 

We now observe the following obvious property giving a connection between 
the oCQ semantics and the SO-tgd semantics: 

Proposition 5.4. Let I be a source instance and let p be an oc-assignment for 
Q(I) with respect to I. Then (/, p(Q(I))) |= Q. 

In fact, Q(I) corresponds to what Fagin et al. [H] call the chase of I with 
sotgd (Q). 

5.1 Nested dependencies 

We have introduced sifo CQs as a restricted class of plain SO-tgds. But actually, 
sifo CQs can also be considered as a restricted form of so-called nested tgds 
Thereto, consider again a sifo CQ of the general form T(x,f(z)) t— B. Let u 
be the sequence of all variables from B 1 except for the creation variables (the 
variables from z). Furthermore, let w be a fresh variable not occurring in B, 
and let H' be the atom T(x, w). We can now associate to Q the following 
implicational statement, denoted by ntgd(Q): 

Vz3w\/u(B —> H') 

Note that ntgd(Q) is now a first-order formula, but it is clear that ntgd(Q) is 
logically equivalent to sotgd(Q). Hence, the schema mappings arising from sifo 
CQs are not essentially second-order in nature. 

5.2 Logical entailment 

In Section 0] we have shown that equivalence of sifo CQs as object-creating 
queries is decidable. Now that we have seen that sifo CQs can also be given a 
semantics as schema mappings, we may again ask if equivalence under this al¬ 
ternative semantics is decidable. The answer is affirmative; we have seen in the 
previous subsection that sifo CQ mappings belong to the class of nested depen¬ 
dencies, and logical implication of nested dependencies has recently been shown 
to be decidable {26j. When this general implication test for nested dependencies 
is applied specifically to sifo CQ schema mappings, it can be implemented in 
non-deterministic polynomial time. Hence, logical entailment (and also logical 
equivalence) of sifo CQ schema mappings is NP-complete. 

In the present section, we present a specialized logical entailment test for sifo 
CQ schema mappings which is much simpler and more elegant, and provides 
more insight in the problem by relating it to testing implication of a join depen¬ 
dency by a conjunctive query (Theorem 15. 101) . Interestingly, there is a striking 
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I 

a\ b c 
02 b c 


J 


ai 

d\ 

a 2 

d2 


Table 8 : Instances used in Example 15.61 


correspondence between the general implication test when applied to sifo CQs, 
and the strategy we use to prove our theorem. An in-depth comparison will be 
given in Section [6j after we have stated the Theorem formally and have seen its 
proof. 

Formally, given two schema mappings A4 and AT from a source schema S 
to a target schema {T}, we say that A4 logically entails AT if the following 
implication holds for every instance I over S and every instance J over {T}: 

(/, J) satisfies M => (I, J) satisfies AT. 

Referring to the view of sifo CQs as SO-tgds introduced above, we now 
define: 

Definition 5.5. Let Q and Q' be two sifo CQs with the same head predicate, 
and over the same database schema. We say that Q logically entails Q' if 
sotgd(Q) logically entails sotgd(Q'). 

Example 5.6. Recall the sifo CQs Q and Q' from Example 14.71 

Q = T(x, f{y)) <- R(x, y, z) 

Q' = T(x, /'(:e, y)) «- R{x, y, z) 

It is clear that Q logically entails Q'. Indeed, if there exists a function / wit¬ 
nessing the truth of sotgd(Q), then we can easily define a function /' witnessing 
the truth of sotgd(Q') by defining f'(x,y ) := f(y). 

Conversely, however, Q' does not logically entail Q. Indeed, Table [S] shows 
(I, J) where (I, J) |= Q' but (/, J) ^ Q. 

Example 5.7. Recall the sifo CQs Q and Q' from Example 14.81 

Q = T (x, /(*))«- R(x, y, z) 

Q' = T(x, f'(x, y, z)) <- R(x, y, z) 

Although Q and Q 1 are not oid-equivalent, they are logically equivalent: they 
logical entail each other. The logical entailment of Q' by Q is again clear. To 
see the converse direction, assume /' witnesses the truth of sotgd(Q'). Then we 
define f(x) for any x as follows: if there exists a pair (y, z) such that R(x, y, z) 
holds, we fix one such pair (y, z) arbitrarily and define f(x) := /'( x, y, z). If no 
such y and z exist, we may define f(x) arbitrarily. It is now clear that this / 
witnesses the truth of sotgd(Q). 
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Example 5.8. Consider the sifo CQs: 

Q = T(x,f(z{)) «- R(zi,x),R(zi,z 2 ) 

Q' = T(x,f'(zi,z 2 )) -e- R(zi,x),R(zi,z 2 ) 

Also here, Q and Q' logically entail each other. The logical entailment of Q' by 
Q is again clear. To see the converse direction, we can use a reasoning similar 
to that used in Example 15.71 Assume /' witnesses the truth of sotgd(Q'). Then 
we define f(zi) for any Zi as follows: if there exists z 2 such that R(zi, z 2 ) holds, 
we fix one such z 2 arbitrarily and define f(z±) := f(z\, z 2 ). If no such z 2 exists, 
we may define f(z±) arbitrarily. The function / thus defined witnesses the truth 
of sotgd(Q). 

Note that the kind of reasoning used here and in Example 15.71 does not work 
in the case of Example 15.61 In Theorem l5. 101 we will characterize formally when 
this kind of reasoning is correct. 

Example 15 . 71 shows that logical equivalence (logical entailment in both direc¬ 
tions) does not imply oid-equivalence of sifo CQs. We will see in Theorem 15. 121 
that the other direction does hold. 

5.3 Join dependencies and tableau queries 

In our characterization of sifo CQ logical entailment we use a number of concepts 
from classical relational database theory |2], which we recall here briefly. 

Recall that a relation scheme is a finite set of elements called attributes. It 
is customary to denote the union of two relation schemes X and Y by juxtapo¬ 
sition, thus writing XY for X UY. 

A tuple over a relation scheme U is a function from U to dom. A relation 
over U is a finite set of tuples over U. 

Let f be a tuple over U and let X CU. The restriction of t to X is denoted 
by t[X], The projection n\ (r) of a relation r over U equals { t[X] \ t € r }. 

We now turn to tableau queries, which are an alternative formalization of 
conjunctive queries so that the result of a query is a set of tuples rather than a 
set of facts. Let S be a database schema, and let B be a finite set of atoms with 
predicates from S, as would be the body of a conjunctive query over S. Let 
V = var(B). For any U C V, the pair ( B , U) is called a tableau query over S. 
When applied to an instance I over S, this tableau query returns a relation over 
U in the following manner. Let Mat(R, I) be the set of all matchings of B in /. 
Using variables for attributes, V can be viewed as a relation scheme. Under this 
view, every valuation on V is a tuple over V, and thus Mat(R, I) is a relation 
over V. We now define the result of ( B,U ) on input I to be 7T[/(Mat(U,/)). 
This result is denoted by ( B,U)(I ). 

We finally recall join dependencies. Let t\ and t 2 be tuples over the relation 
schemes Ui and U 2 , respectively. If t\ and t 2 agree on U\ f~l U 2 , the union 
t\ U t 2 (where we take the union of two functions, viewed as sets of pairs) is a 
well-defined tuple over the relation scheme U\U 2 . The natural join r± m r 2 , for 
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relations rq and r 2 over U\ and U 2 , respectively, then equals 

{t\ U t 2 \ t x e r 1 k t 2 G r 2 k t\\U\ (~l U 2 ] = n t/ 2 ]}■ 

Consider now any relation r over some relation scheme U. Let U\ and 
U 2 be subsets of U (not necessarily disjoint) such that U = U\U 2 - Then r 
satisfies the join dependency (JD) U\ m U 2 if r = xi Tru 2 (r). Note that 

the containment from left to right is trivial, so one only needs to verify the 
containment ( r ) n ttu 2 (r) C r. 

The logical implication of JDs by tableau queries is well understood and 
can be solved by the chase procedure with NP complexity gam. Formally, 
a tableau query Q = ( B,U) over S is said to imply a JD over U if for every 
instance I over S, the relation Q(I ) satisfies this JD. 

5.4 Decidability of sifo CQ logical entailment 

We consider two sifo CQs Q and Q' with the same head predicate: 

Q = T(x,f(z)) <r- B 
Q' = T{x',f'{z'j) e- B' 

Remark 5.9. We assume Q and Q' to have their function symbol in the same 
position in the head (here taken to be the last position). This is justified because 
otherwise Q could never logically entail Q'. In proof, suppose the function 
symbol in the head of Q' would not be in the last position. Then we have a 
variable x' from B' in the last position. Now consider an instance / such that 
both Q(I ) and Q'(I) are nonempty. (Such an instance could be constructed by 
taking the disjoint union of B and B' and substituting constants for variables.) 
Let p by an oc-assignment for Q(I) with respect to I. By Proposition 15.41 we 
have (I, p(Q(I))) \= Q. In p(Q(I)), none of the elements in the last position of 
a T-fact belongs to adom(J). But then (/, p(Q(I))) cannot satisfy Q'. Indeed, 
since Q\I) is nonempty, there is a matching a' : B' —> I. In any J' such that 
(/, J') 1= Q\ there needs to be a T-fact with a'(x') in the last position, and 
a'(x') € adom(/). We conclude that Q does not logically entail Q'. □ 

In what follows we use A', Z and Z’ to denote the sets of variables appearing 
in the tuples x, z and z ', respectively. 

We establish: 

Theorem 5.10. Q logically entails Q’ if and only if there exists a homomor¬ 
phism h : B —> B' satisfying the following conditions: 

1. h(x) = x!; 

2. h{X nZ)C Z'; 

3. Let Yh := h~ 1 (Z'), i.e., Yh = {y G var(B) | h(y) € Z'}. Then the tableau 
query (B, XYhZ) implies the join dependency XYh m Y^Z. 
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Proof of sufficiency. Let (/, J) \= Q, witnessed by the function /. We must 
show (I, J) |= Q '. This means finding a function /' witnessing the truth of 
sotgd(Q') in (J, J). 

Call any two matchings 011,02 £ Mat (B,I) equivalent if they agree on Yh- 
This is denoted by oi = 02- Let p be any function from Mat(f?, I) to Mat(f?, I) 
with the two properties, first, that p(a) = o and, second, that Oi = 02 implies 
p{ oi) = p( 02). Thus, p amounts to choosing a representative out of each 
equivalence class. We denote the application of p by subscripting, writing p(a) 
as p a . 

Let us define /' as follows. Take any matching /3 : B r I. Then we put 
f'{8(z')) := f(ppoh{z)). To see that this is well-defined, recall that h(Yh ) C Z'. 
Hence, fii(z') = 82[z') implies that 81 o h = 82 o h, so pp lQ h = Pp 2 °h- 

We now show that this interpretation of f satisfies the requirements. Specif¬ 
ically, let 8 : B' —> I be a matching. We must show that T(/ 3 (i'), /' (/ 3 (z r ))) £ J. 
Consider the valuations 81 = ( 3 oh and 82 = pp 0 h, both belonging to Mat(S, I), 
and viewed as tuples over the relation scheme var (B). Since these two tu¬ 
ples agree on Yh, also the two restrictions 81 [WW] and 82[YiZ] agree on Yh- 
Since Ifl Z C Yh, the union 8\ [WW] U 82[Y,.Z] is a well-defined tuple over 
XYhZ. Since 7rxy h z(Mat(i3,/)) satisfies the JD Yy L X m YhZ, the union be¬ 
longs to 7rxv h z(Mat(B,/)). Hence, there exists a valuation 7 £ Mat (B,I) 
that agrees with 8 0 h on X, and with pp 0 h on Z. Since (J, J) (= Q , we have 
f(j{z))) £ J. By the preceding, q(x) = @(h(x)) and 7 (z) = p@oh{z ) = 
g(ft{z')). We conclude that T(8(x'), g(8{z'))) £ J as desired. 

Proof of necessity. Let V' = var (B'), and let n be the arity of /. For each 
l £ { 0 , 1 ,..., n} and each u £ V' ~ Z' we introduce a fresh copy of u, denoted 
by u l . We say that this fresh copy is “colored” with color l. For each variable 
u £ Z', we simply define u l to be u itself. We say that the variables in Z' are 
“colored white”. 

For any tuple of variables u = (u\,...,u p ) in V', we denote the tuple 
(• u[,... ,u l p ) by u l . In this tuple, all variables are colored l or white. We then 
define B a = { R(u l ) \ R(u ) £ B '} and view it as an instance, i.e., the variables 
u l are considered to be constants. 

Now define the instance I = U"=o an d construct the instance J = Q{I)- 
By Proposition 15.41 (/, J) (= Q, where we omit the oc-assignment for the sake 
of clarity. Since Q logically entails Q ', also (/, J) (= Q'. Hence, there exists a 
function /' such that for each color l, using the matching id z : B' —>• /, u 1 —> u l , 
the fact T(x' l ,f'{z' 1 )) = T(x’ l ,f(z')) belongs to J. 

Since J = Q(I), we have f'(z') = f(w) for some tuple w of colored variables 
in V'. Since the arity of / is n and there are n + 1 distinct colors, some color 
does not appear in w. Without loss of generality we may assume that this is 
the color 0. 

Let us now focus on the fact T(x'°, f(w)) in J. Like any T-fact in J, this 
fact has been produced by some matching k: B —> I such that T(x ,0 ,f(w)) = 
k(T(x, f(z))), so 
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(a) k(x) = x'° and 

(b) k(z) = w. 

Let s denote the mapping that removes colors, i.e., s(u l ) = u for every 
u € V' and every l G {0,l,...,n}. Since s(I) C B', we have a homomorphism 
sok: B —> B'. We now define h := sok and show that it satisfies the conditions 
required by the Theorem. The first condition is clear since h{x) = s(k(x)) = 
s(x'°) = x'. 

For the second condition, let x G X fl Z. By (a), k(x) is colored 0 or white. 
By (b), k(x) is colored non-zero or white. Hence, k(x) is colored white, i.e., 
k{x) G Z', so h{x) = s(k(x)) = k{x) G Z' as desired. 

Finally, to show that (B, XYhZ) implies XYh n Y^Z we must establish the 
query containment 


(B,XY h ) n (B,Y h Z) C (B,XY h Z). 

Treating tableau queries as conjunctive queries, and using the well-known con¬ 
tainment criterion for conjunctive queries, this amounts to showing the ex¬ 
istence of a certain homomorphism. More specifically, we express the query 
(B,XYh) n ( B,YhZ) by the conjunctive query with the body B 2 = B 0 Li B\ 
defined as follows. The body Bq is obtained from B by replacing each variable 
u not in Yj, by a fresh copy u°. For each u G Yh we define u° simply as u 
itself. The body Bi is obtained from B by replacing each variable not in Y, 
by a fresh copy tr. Again, for each u G Yh we define u 1 simply as u itself. To 
show the containment, we now must find a homomorphism m from B to Bi 
such that each u G X — Yh is mapped to u°; each u G Yh is mapped to u; and 
each u G Z — Yh is mapped to tr. 

Thereto, we define the following mapping m: 

• if k(u) is colored 0, then m{u) := u°; 

• if k(u) is colored l for some l > 0, then m(u) := u 1 ; 

• if k(u) is colored white, then m(u) := u. 

Let us verify that m: B —> is a homomorphism. Consider an atom R(u) in 
B\ we must show R(m(u)) G B 2 - Since k: B —> I, we know that R(k(u)) G I. 
By definition of I, this means that R(k(u)) = R(v l ) for some atom R(v) in B' 
and some color l. So, for each variable u in u, the color of k(u) is either l or 
white. We now distinguish two cases. 

• If k(u) is colored white, then h(u) = k{u) G Z' so u G Yh. Hence, in this 
case, m(u ) = u = u° = u 1 . 

• If k(u) is colored l, then by definition m{u) = u° when l = 0, and m(u) = 
u l when l > 0. 
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We conclude that R(m(u)) = R(u°) G B 0 when l = 0, and R(m(u)) = R^u 1 ) G 
Bi when l > 0. Hence, since B 2 = Bq U B\, we always have R{m(u)) G B 2 as 
desired. 

It remains to verify that m maps the variables in XY^Z correctly. If u 6 Yj t . 
then h(u) = k(u) G Z' so k(u) is colored white and m(u) = u as desired. If 
u G X — Yh , then by (a), k(u) is colored 0 so m{u) = u° as desired. Finally, if 
u G Z — Yh, then by (b), k(u) is colored l > 0 so m(u) = u 1 as desired. □ 

As a corollary, we obtain that the complexity of deciding logical entailment 
for sifo CQs is not worse than that of deciding containment for classical CQs: 

Corollary 5.11. Testing logical entailment of sifo CQs is NP-complete. 

Proof. Membership in NP follows from Theorem 15.101 as a witness for logical 
entailment we can use a homomorphism h satisfying the first two conditions of 
the theorem, together with a homomorphism ho from the query ( B,XYhZ ) to 
the query (B,XYh) n (B,YhZ) witnessing the third condition of the theorem. 
NP-hardness follows because the problem has containment of classical CQs as a 
special case, which is well known to be NP-hard. Indeed, logical entailment of 
a sifo Q’ by a sifo Q, in the special case where the creation functions of Q and 
Q' are nullary, amounts to classical containment of Q in Q’ when we ignore the 
function terms in the heads. □ 

5.5 From oid-equivalence to logical entailment 

Let Q and Q' be sifo CQs of the general forms m and (0 from Subsection 14.41 
From our main Theorems 14.191 and 15.101 we can conclude the following. 

Theorem 5.12. If Q and Q' are oid-equivalent, then Q logically entails Q'. 

Proof. By Theorem 14.191 there exists a permutation tt of Z — X such that 
and Q' are equivalent. Hence there is a homomorphism h : Q 71 —>• Q'. Clearly 
h : B —» B' . We verify that h satisfies the conditions of Theorem 15.101 thus 
showing that Q logically entails Q'. 

1. Since h maps the head of Q 77 to the head of Q', we have h(x) = x and 
h(ir(z)) = z. Since x' = x , we have h(x) = x! as desired. 

2. Since h is the identity on X , we have h(X fl Z) = X Cl Z C Z = Z' as 
desired. 

3. Since h(ir(z)) = z and w(Z) = Z, we have h(Z) = Z = Z'. Hence Z C Yh- 

But then the join dependency XY h ix Y h Z becomes XY h n Y h which 
trivially holds. □ 
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6 Discussion 


The results in this paper provide an understanding of the notions of oid-equivalence 
and logical entailment for sifo CQs. Sifo CQs, however, form a very simple sub¬ 
class of oCQs. Moreover, oCQs themselves are rather limited, for example, they 
consist of a single rule and the rule can have only one atom in the head. Thus 
there are at least three natural directions for further research: (i) allowing more 
than one function in the head; (ii) allowing more than one atom in the head; 
(iii) allowing more than one rule. 

Containment Furthermore, in addition to oid equivalence of oCQs, it would 
be natural to also investigate a notion of oid- containment. There are actually 
at least two reasonable ways to define such a notion. The situation is similar 
to that in research on CQs with counting or bag semantics [Tol HJ. Most of 
the known results are for equivalence only, with the extension to containment 
typically an open problem. Indeed, our characterization of oid-equivalence for 
sifo CQs relies on equivalence of CQs with bag semantics. An extension to oid- 
containment will likely need a similar advance on containment of CQs with bag 
semantics. 

Sifo CQs and ILOG In the introduction we mentioned that sifo CQs, and 
oCQs in general, are a fragment of ILOG without recursion [221. Sifo CQs 
belong to the subclass of the class of recursion-free ILOG programs “with iso¬ 
lated oid creation” [23] . For this class, oid-equivalence was already known to 
be decidable. This was shown by checking all finite instances up to some ex¬ 
ponential size. Hence, our NP-completeness result for oid-equivalence of sifo 
CQs does not follow from the previous work. More generally, the decidability of 
oid-equivalence for general recursion-free ILOG programs, or already of oCQs 
for that matter, is a long-standing open question. Various interesting examples 
showing the intricacies of this problem have already been given by Hull and 
Yoshikawa j23l . 

Sifo CQs and nested dependencies In Section ISTl we also presented sifo 
CQs, now viewed as schema mappings, as a very simple subclass of nested tgds. 
The implication problem for general nested tgds was shown to be decidable by 
Kolaitis et al. |26| in work done independently from the present paper. Never¬ 
theless our characterization of implication for sifo CQs, given by Theorem lb.lOl 
does not follow from the general decision procedure for nested tgds. Instead, 
the general procedure, when applied to two sifo CQs, is strikingly similar to 
our proof of necessity of our Theorem. Using the notation from that proof, the 
general procedure applied to test implication of sifo CQ Q' by sifo CQ Q would 
amount to testing for the existence of a homomorphism h from {T(x n , \ 

l = 0, ...,n} to Q(I). Since Q(I ) = {T(a(x), f(a(z))) a : 5 —» /}, this 
can be implemented by guessing h and n + 1 matchings cq : B —> I such that 
(h(x n ), f'{h{z'))) = (ai(x), f(ai(z))) for l = 0,. ,.,n. In contrast, as explained 
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Table 9: Instances used to illustrate logical entailment in the presence of multiple 
functions. 


in Corollary 15.111 our characterization involves guessing just two homomor- 
phisms. 

Sifo CQs and plain SO-tgds As described in Section^ sifo CQs are a very 
simple subclass of plain SO-tgds. For plain SO-tgds, deciding logical equivalence 
is again an open problem. Also, the notion of oid-equivalence, defined in this 
paper for oCQs, can be readily extended to plain SO-tgds. We illustrate some 
difficulties involved in allowing multiple functions in the head, which is indeed 
allowed in plain SO-tgds. First, consider the oid-equivalence problem. For sifo 
CQs we have shown in Section 4.4 of this paper that, as far as oid-equivalence 
is concerned, only the counts of generated oids per tuple are important. Now 
consider the following pair of oCQs: 

Q = T (x, f(y), g(x, z))4- R(x, y),R(x, z ) 

Q' = T(x,f(y),g(x,y)) 4 - R(x,y),R{x,z) 

Both queries create the same number of new /-oids and g-oids per ic-value, but 
now it also becomes important how these oids are paired. In Q more pairs are 
generated for each x, and the two queries are not oid-equivalent. So, in the case 
of multiple functions, also the interaction between the multiple terms needs to 
be taken into account in some way. 

A similar comment applies to the problem of logical equivalence. It is not 
immediately clear how the join dependency condition of Theorem 15.101 should 
be generalized in the presence of multiple functions. Consider, for example, the 
following: 

Q = T (x, fi{zi,yi), f2(z2,V2)) <- R(x,zi,z 2 ),S(z 1 ,y 1 ),S(z 2 ,y 2 ) 

Q' = T(x,gi(u),g 2 (u)) 4- R(x,u,x), R(x,x,u), S(u,vi), S(x,v 2 ) 

The /i-part of Q (ignoring the third component in the head) logically entails 
the gi-part of Q ', and likewise the / 2 -part of Q (ignoring the second component 
in the head) logically entails the g 2 -pait of Q'. Globally, however, Q does not 
logically entail Q'; this can be seen by the instances shown in Tabled which 
satisfy Q but not Q'. 

A related interesting question then is whether Theorem 15.121 that oid- 
equivalence implies logical entailment, still holds for plain SO-tgds. When we 
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allow nested function terms in the head (which goes beyond plain SO-tgds) the 
implication breaks down, as shown by the following example na Example 3.8]: 


Q = T(x,f(x), g(.f(x))) <- S(x) 

Q' = T {x, f(x), g(x)) <- S(x) 

Here Q and Q' are oid-equivalent, and Q logically entails Q 1 , but Q' does not 
logically entail Q. 
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